home *** CD-ROM | disk | FTP | other *** search
-
- DIACONVERTOR version 1.0 (c)1996 William H. Oldacre
- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
- Filters accent marks, extended ASCII, and control characters from
- foreign language text files to permit emailing and wordprocessing.
-
- DiaConvertor is shareware. It is not free software, it is not in the Public
- Domain. However, you are welcome to try it out and give copies away. Please
- see the file LICENSE.DOC, which accompanies DiaConvertor and this document.
-
- What It Does
- ------------
- DiaConvertor intelligently removes accent (diacritical) marks from text
- generated by foreign language translation programs. It also filters out any
- other extended ASCII graphic and control characters that email channels and
- word processors choke on. Optical character recognition (OCR) programs often
- insert spurious characters into text. And printers will print garbage unless
- they receive special codes telling them to print control characters and
- extended ASCII. Removal with even the best word processors can take hours.
- DiaConvertor cleans such files in seconds leaving the document ready to use,
- often with no editing whatsoever. Ordinary punctuation and overall text
- geometry is not disturbed.
-
- For example, assume your document contains these characters:
-
- ╔═════°════════════════════════φ
- ∞ ║ GRAPHICδCHARACTERS JAM EMAIL ║
- ° ╚══════════════════════════════╝
- Σ Çava français, découvrissions près d'oùε
- √ espérons êtes sûrs l'été très occupés à!
- âäàåáªÄÅéêëèÉïîìíôöòóºÖüûùúÜçÇñÑÿ,æ,Æ
-
- After DiaConversion they will look like this:
-
- GRAPHIC CHARACTERS JAM EMAIL
-
- Cava francais, decouvrissions pres d'ou
- esperons etes surs l'ete tres occupes a!
- aaaaaaAAeeeeEiiiioooooOuuuuUcCnNy,ae,AE
-
- Tutorial
- --------
- DiaConvertor operates on DOS text files containing ASCII characters, control
- characters, and extended ASCII characters. It should not be used directly
- with word processor or language translator files which are in a proprietary
- document format. For example, WordPerfect files may end with .WP or .WPG and
- WordStar files .WS. Such files contain binary formating information that will
- be destroyed. Most word processors and translators have a means for creating
- text files in ASCII. Usually they will refer to this as "ASCII format" or
- "DOS text format" or something similar. Examine your menus and experiment.
- Most ASCII files will end with the extension .ASC, .TXT, or .DOC. To check,
- use this command line to examine the file:
-
- TYPE filename | MORE
-
- The file should be easily readable without a lot of binary gibberish mixed
- with the text. When in doubt, make a copy of the file you want to process in
- a temporary directory and experiment.
-
- Used from the DOS command line, DiaConvertor employs "redirection" to process
- files. This provides the user with maximum flexibility. If you aren't very
- familiar with this subject, don't worry. It's easy. You just type DIACON with
- some arrows which tell your computer where to get data from and where to send
- it:
-
- DIACON <infile >outfile
-
- Notice the arrow in front of "infile" points -towards- DIACON. That feeds the
- file you want processed into DiaConvertor. The arrow pointing away from
- DIACON (towards "outfile") tells your computer what file you want the results
- stored in. For example, assume you have a file, FRENCH.TXT, which needs
- cleaning up. To put the results in another file, FRENCH.CLN, just type the
- following:
-
- DIACON <FRENCH.TXT >FRENCH.CLN
-
- The arrows tell DOS where to get DIACON's data from and where to put it after
- processing. The following command line, which copies a file onto itself, will
- not work and should generate an error message:
-
- DIACON <FRENCH.TXT >FRENCH.TXT
-
- WARNING, the following command line will cause DOS to erase your original
- file. This is a function of your DOS operating system, not DiaConvertor.
-
- (not safe) DIACON >FRENCH.TXT
-
- Its good policy to always work with a copy of your document, not the
- original, in case you make a mistake. If you want to be sure not to erase
- anything, always type the right facing direction arrow twice. This causes DOS
- to add data to the end of an existing file instead of overwriting it:
-
- (safer) DIACON <FRENCH.TXT >>FRENCH.CLN
-
- Perhaps you would like to see what the results will look like before you
- begin processing a large file. Here is where redirection really shines:
-
- DIACON <FRENCH.TXT | MORE
-
- This tells DOS to provide DIACON's input from FRENCH.TXT and channel output
- to the computer screen. The MORE command causes the screen to pause so you
- can view the results a page at a time. Press the RETURN key to see the next
- screen. When you've seen enough, press CTRL-C to quit.
-
- For real muscle, you can use the pipeline symbol, "|", to chain DIACON to a
- string of other programs. Pipelining is a very powerful technique. Suppose
- you have three programs and want to send the output from PROG1 to PROG2 and
- then to PROG3. Your command line will look something like this:
-
- PROG1 <infile | PROG2 | PROG3 >outfile
-
- The following command line removes LETTERS.TXT from a compressed file,
- GASTON1.ZIP, then feeds it through DIACON. JUSTIFY squares up the text lines
- and, finally, a printing program, PR, inserts the date, time, and page
- numbers. The arrow tells DOS to put the results in a file, READY2GO.TXT.
-
- UNZIP -p GASTON1.ZIP LETTERS.TXT | DIACON | JUSTIFY | PR >READY2GO.TXT
-
- To test exactly what effect DiaConvertor will have on certain characters,
- type ECHO on the command line. Type a space, then hold down the ALT key and
- enter the ASCII decimal values of your characters on the numeric keypad (see
- ASCII charts) followed by the pipline symbol, "|". Type DIACON after the
- pipeline and press ENTER. DOS pipes the ECHOed characters to DiaConvertor
- which sends the conversions to the screen. (This may not work on ASCII values
- below 32). Your session might look like this:
-
- C>ECHO àäâ/æ | DIACON
- aaa/ae
-
- C>
-
- Getting Help:
- -------------
- DiaConvertor is as friendly as it is fast. If you seem to be having trouble
- or make a mistake on the command line, DIACON pops up a screen with
- information to help you. To call it up, just type DIACON followed by a space
- and a question mark (or just about anything else):
-
- DIACON ? or DIACON /? or DIACON -? or DIACON -h or DIACON HELP, etc.
-
- How It Works
- ------------
- ASCII stands for the American Standard Code for Information Interchange.
- Basically, it uses numbers to represent all of the keys you see on a standard
- keyboard. The ASCII characters you are viewing now are actually numbers
- between 32 and 127 in your computer's memory. "Control characters", like
- carriage return, line feed, back space, tab, form feed, escape, and others
- occupy the range from 0 to 31. But you can actually represent 256 different
- values within one computer byte. Why not take advantage of this to
- incorporate some special symbols? When IBM built their first personal
- computers, they extended the ASCII character set to include values from 128
- to 255 and filled them with useful graphic symbols.
-
- Unfortunately, most email systems run on pure ASCII (7 bits) and can't handle
- the full (8 bits) range of values. Even some excellent word processors, like
- WordStar, become dangerously confused by such data because they use the "high
- bit" for their own purposes. Worse, some imbedded control characters can have
- the same effect. You can tell something's wrong. Either some characters are
- missing, your whole document gets torn to shreds, or your computer hangs and
- the document is lost.
-
- DiaConvertor searches a document for control characters (0 - 31) and replaces
- everything except carriage return (13) and line feed (10) with spaces. Tabs
- (9) are replaced with eight spaces (which is what they translate to on your
- computer screen). Extended ASCII characters (128 - 255) are intelligently
- replaced either with spaces or regular ASCII alphabet characters. This
- preserves document readablity and layout. While DiaConvertor was programmed
- primarily to resolve problems with diacritical marks, it works well with any
- text file. Once DiaConverted, almost any file can be loaded into your word
- processor for final editing.
-
- System (minimum) Requirements:
- ------------------------------
- IBM compatible computer with 8088 or higher processor, 256K RAM, 1 floppy
- drive, MSDOS or PCDOS 3.3 or higher, monochrome or better display. Optimized
- for speed, DiaConvertor was designed to run on virtually any DOS computer.
-
- A Personal Message About Shareware
- ----------------------------------
- Shareware is full featured software distributed on the honor system. You are
- welcome to try it out or give it to friends. But if you continue to use it
- past a reasonable trial period (in this case 30 days) you are expected to pay
- for it. Please give this careful consideration.
-
- I can think of no other system which could have put so much good software in
- the hands of computer users conveniently and at such low cost. If the
- shareware concept degrades into nothing more than a way to distribute
- crippled demos (ransomware), it will have happened because too many people
- didn't bother to support those who invest their time in it. For every piece
- of shareware out there, there is an author waiting to hear whether you liked
- it and what improvements you would like to see. Their hearts sink a little
- every day when those who use the software don't honor their request to pay.
-
- The advantages of shareware to the computer user are tremendous. First, users
- save on distribution fees (fancy looking boxes and all that marketing cost a
- fortune). And when you are in dire need of a problem solving utility (usually
- 1:00 Sunday morning), you only have to log onto a computer service or the
- Internet to download the solution. If you need expert help, you can usually
- get the actual author on the telephone without voicemail (try that with any
- of that plastic wrapped software).
-
- It's after 2:00 A.M. as I sit working on this document. My eyes are almost as
- strained as my finances. I'm doing this because I want someone else to
- benefit from the software I've written. The small amount of money I receive
- will be a pleasent reminder that there are people of honor out there who
- recognize a good thing when they see it. The shareware concept is in both our
- best interests, yours and mine. Please support it in spirit and in deed.
-
- Remember, good software doesn't cost money or time ----- it saves it.
-
- Custom Versions, Site Licenses
- ------------------------------
- Custom modifications of DiaConvertor are available. If your organization has
- a special problem and a modified DiaConvertor might solve it, please feel
- free to contact me. Also, site licenses can be arranged at very low cost. You
- can put a copy of DiaConvertor on every computer in your company or
- department and still save money. Nothing costs more than an employee's time
- spent on needless manual editing.
-
- To Contact The Author
- ---------------------
-
- William H. Oldacre
- P.O. Box 12951
- Gainesville, Florida 32604
- U.S.A.
-
- Voice phone: 1-352-332-3010
- Email: 76114.2307@compuserve.com
-
- Acknowledgements
- ----------------
- Special thanks to Daniel Boily and Sonia Normandin of Quebec for their
- friendship and sufferance of what I have done to their beautiful French.
- Thanks most of all to my wife, Sherlee, for still loving me when the hour
- becomes very late at the computer.
-
- ┌─────────────────────────────────┐
- │ ASCII CHART │
- ┌───────────────────────┼────────────────┬────────────────┼────────────────┐
- │ dec oct hex sym chr │dec oct hex chr │dec oct hex chr │ dec oct hex chr│
- ├───────────────────────┼────────────────┼────────────────┼────────────────┤
- │ 0 000 00 ^@ null np │ 32 040 20 sp │ 64 100 40 @ │ 96 140 60 ` │
- │ 1 001 01 ^A soh │ 33 041 21 ! │ 65 101 41 A │ 97 141 61 a │
- │ 2 002 02 ^B stx │ 34 042 22 " │ 66 102 42 B │ 98 142 62 b │
- │ 3 003 03 ^C etx │ 35 043 23 # │ 67 103 43 C │ 99 143 63 c │
- │ 4 004 04 ^D eot │ 36 044 24 $ │ 68 104 44 D │ 100 144 64 d │
- │ 5 005 05 ^E enq │ 37 045 25 % │ 69 105 45 E │ 101 145 65 e │
- │ 6 006 06 ^F ack │ 38 046 26 & │ 70 106 46 F │ 102 146 66 f │
- │ 7 007 07 ^G bel │ 39 047 27 ' │ 71 107 47 G │ 103 147 67 g │
- │ 8 010 08 ^H bs │ 40 050 28 ( │ 72 110 48 H │ 104 150 68 h │
- │ 9 011 09 ^I ht np │ 41 051 29 ) │ 73 111 49 I │ 105 151 69 i │
- │ 10 012 0A ^J lf np │ 42 052 2A * │ 74 112 4A J │ 106 152 6A j │
- │ 11 013 0B ^K vt │ 43 053 2B + │ 75 113 4B K │ 107 153 6B k │
- │ 12 014 0C ^L ff │ 44 054 2C , │ 76 114 4C L │ 108 154 6C l │
- │ 13 015 0D ^M cr np │ 45 055 2D - │ 77 115 4D M │ 109 155 6D m │
- │ 14 016 0E ^N so │ 46 056 2E . │ 78 116 4E N │ 110 156 6E n │
- │ 15 017 0F ^O si │ 47 057 2F / │ 79 117 4F O │ 111 157 6F o │
- │ 16 020 10 ^P dle │ 48 060 30 0 │ 80 120 50 P │ 112 160 70 p │
- │ 17 021 11 ^Q dc1 │ 49 061 31 1 │ 81 121 51 Q │ 113 161 71 q │
- │ 18 022 12 ^R dc2 │ 50 062 32 2 │ 82 122 52 R │ 114 162 72 r │
- │ 19 023 13 ^S dc3 │ 51 063 33 3 │ 83 123 53 S │ 115 163 73 s │
- │ 20 024 14 ^T dc4 │ 52 064 34 4 │ 84 124 54 T │ 116 164 74 t │
- │ 21 025 15 ^U nak │ 53 065 35 5 │ 85 125 55 U │ 117 165 75 u │
- │ 22 026 16 ^V syn │ 54 066 36 6 │ 86 126 56 V │ 118 166 76 v │
- │ 23 027 17 ^W etb │ 55 067 37 7 │ 87 127 57 W │ 119 167 77 w │
- │ 24 030 18 ^X can │ 56 070 38 8 │ 88 130 58 X │ 120 170 78 x │
- │ 25 031 19 ^Y em │ 57 071 39 9 │ 89 131 59 Y │ 121 171 79 y │
- │ 26 032 1A ^Z sub np │ 58 072 3A : │ 90 132 5A Z │ 122 172 7A z │
- │ 27 033 1B ^[ esc │ 59 073 3B ; │ 91 133 5B [ │ 123 173 7B { │
- │ 28 034 1C ^\ fs │ 60 074 3C < │ 92 134 5C \ │ 124 174 7C | │
- │ 29 035 1D ^] gs │ 61 075 3D = │ 93 135 5D ] │ 125 175 7D } │
- │ 30 036 1E ^^ rs │ 62 076 3E > │ 94 136 5E ^ │ 126 176 7E ~ │
- │ 31 037 1F ^_ us │ 63 077 3F ? │ 95 137 5F _ │ 127 176 7F del│
- ├───────────────────────┴────────────────┴────────────────┴────────────────┤
- │ ^ = control key depressed while character is typed. │
- │ np = non-printable, character in file will disturb loading or printing. │
- │ sp = space character. │
- │ │
- │ (c)1996 William H. Oldacre All rights reserved. │
- └──────────────────────────────────────────────────────────────────────────┘
-
- ┌─────────────────────────────────────┐
- │ EXTENDED ASCII CHART │
- ┌─────────────────┼──────────────────┬──────────────────┼──────────────────┐
- │ dec oct hex char│ dec oct hex char│ dec oct hex char│ dec oct hex char│
- ├─────────────────┼──────────────────┼──────────────────┼──────────────────┤
- │ 128 200 80 Ç │ 160 240 a0 á │ 192 300 c0 └ │ 224 340 e0 α │
- │ 129 201 81 ü │ 161 241 a1 í │ 193 301 c1 ┴ │ 225 341 e1 ß │
- │ 130 202 82 é │ 162 242 a2 ó │ 194 302 c2 ┬ │ 226 342 e2 Γ │
- │ 131 203 83 â │ 163 243 a3 ú │ 195 303 c3 ├ │ 227 343 e3 π │
- │ 132 204 84 ä │ 164 244 a4 ñ │ 196 304 c4 ─ │ 228 344 e4 Σ │
- │ 133 205 85 à │ 165 245 a5 Ñ │ 197 305 c5 ┼ │ 229 345 e5 σ │
- │ 134 206 86 å │ 166 246 a6 ª │ 198 306 c6 ╞ │ 230 346 e6 µ │
- │ 135 207 87 ç │ 167 247 a7 º │ 199 307 c7 ╟ │ 231 347 e7 τ │
- │ 136 210 88 ê │ 168 250 a8 ¿ │ 200 310 c8 ╚ │ 232 350 e8 Φ │
- │ 137 211 89 ë │ 169 251 a9 ⌐ │ 201 311 c9 ╔ │ 233 351 e9 Θ │
- │ 138 212 8a è │ 170 252 aa ¬ │ 202 312 ca ╩ │ 234 352 ea Ω │
- │ 139 213 8b ï │ 171 253 ab ½ │ 203 313 cb ╦ │ 235 353 eb δ │
- │ 140 214 8c î │ 172 254 ac ¼ │ 204 314 cc ╠ │ 236 354 ec ∞ │
- │ 141 215 8d ì │ 173 255 ad ¡ │ 205 315 cd ═ │ 237 355 ed φ │
- │ 142 216 8e Ä │ 174 256 ae « │ 206 316 ce ╬ │ 238 356 ee ε │
- │ 143 217 8f Å │ 175 257 af » │ 207 317 cf ╧ │ 239 357 ef ∩ │
- │ 144 220 90 É │ 176 260 b0 ░ │ 208 320 d0 ╨ │ 240 360 f0 ≡ │
- │ 145 221 91 æ │ 177 261 b1 ▒ │ 209 321 d1 ╤ │ 241 361 f1 ± │
- │ 146 222 92 Æ │ 178 262 b2 ▓ │ 210 322 d2 ╥ │ 242 362 f2 ≥ │
- │ 147 223 93 ô │ 179 263 b3 │ │ 211 323 d3 ╙ │ 243 363 f3 ≤ │
- │ 148 224 94 ö │ 180 264 b4 ┤ │ 212 324 d4 ╘ │ 244 364 f4 ⌠ │
- │ 149 225 95 ò │ 181 265 b5 ╡ │ 213 325 d5 ╒ │ 245 365 f5 ⌡ │
- │ 150 226 96 û │ 182 266 b6 ╢ │ 214 326 d6 ╓ │ 246 366 f6 ÷ │
- │ 151 227 97 ù │ 183 267 b7 ╖ │ 215 327 d7 ╫ │ 247 367 f7 ≈ │
- │ 152 230 98 ÿ │ 184 270 b8 ╕ │ 216 330 d8 ╪ │ 248 370 f8 ° │
- │ 153 231 99 Ö │ 185 271 b9 ╣ │ 217 331 d9 ┘ │ 249 371 f9 ∙ │
- │ 154 232 9a Ü │ 186 272 ba ║ │ 218 332 da ┌ │ 250 372 fa · │
- │ 155 233 9b ¢ │ 187 273 bb ╗ │ 219 333 db █ │ 251 373 fb √ │
- │ 156 234 9c £ │ 188 274 bc ╝ │ 220 334 dc ▄ │ 252 374 fc ⁿ │
- │ 157 235 9d ¥ │ 189 275 bd ╜ │ 221 335 dd ▌ │ 253 375 fd ² │
- │ 158 236 9e ₧ │ 190 276 be ╛ │ 222 336 de ▐ │ 254 376 fe ■ │
- │ 159 237 9f ƒ │ 191 277 bf ┐ │ 223 337 df ▀ │ 255 377 ff │
- ├─────────────────┴──────────────────┴──────────────────┴──────────────────┤
- │ Note: To generate these characters on a DOS computer, hold down the │
- │ Alt key and type the decimal number on the numeric keypad. It │
- │ will appear when the Alt key is released. │
- │ │
- │ (c)1996 William H. Oldacre All rights reserved. │
- └──────────────────────────────────────────────────────────────────────────┘
-
-